Improved Word-Level Alignment: Injecting Knowledge about MT Divergences

نویسندگان

Bonnie J. Dorr

Lisa Pearl

Rebecca Hwa

Nizar Habash

چکیده

Under consideration for other conferences (specify)? none Abstract Word-level alignments of bilingual text (bitexts) are not only an integral part of statistical machine translation models, but also useful for lexical acquisition, treebank construction, and part-of-speech tagging. The frequent occurrence of divergences, structural diierences between languages, presents a great challenge to the alignment task. We resolve some of the most prevalent divergence cases by using syntactic parse information to transform the sentence structure of one language to bear a closer resemblance to that of the other language. In this paper, we show that common divergence types can be found in multiple language pairs (in particular, we focus on English-Spanish and English-Arabic) and systematically identiied. We describe our techniques for modifying English parse trees to form resulting sentences that share more similarity with the sentences in the other languages; nally, we present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that divergence-handling can improve word-level alignment. Abstract Word-level alignments of bilingual text (bitexts) are not only an integral part of statistical machine translation models, but also useful for lexical acquisition, treebank construction, and part-of-speech tagging. The frequent occurrence of divergences, structural diierences between languages, presents a great challenge to the alignment task. We resolve some of the most prevalent divergence cases by using syntactic parse information to transform the sentence structure of one language to bear a closer resemblance to that of the other language. In this paper, we show that common divergence types can be found in multiple language pairs (in particular, we focus on English-Spanish and English-Arabic) and systematically identiied. We describe our techniques for modifying English parse trees to form resulting sentences that share more similarity with the sentences in the other languages ; nally, we present an empirical analysis comparing the complexities of performing word-level alignments with and without divergence handling. Our results suggest that divergence-handling can improve word-level alignment .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT

The continuously growing MT market faces the challenge of translating new languages, diverse genres, and different domains using a variety of available linguistic resources. As such, MT system adaptability has become a sought-after necessity. An adaptable statistical or Hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches p...

متن کامل

Improving Bitext Word Alignments via Syntax-based Reordering of English

We present an improved method for automated word alignment of parallel texts which takes advantage of knowledge of syntactic divergences, while avoiding the need for syntactic analysis of the less resource rich language, and retaining the robustness of syntactically agnostic approaches such as the IBM word alignment models. We achieve this by using simple, easily-elicited knowledge to produce s...

متن کامل

Title of dissertation : COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT

Title of dissertation: COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT Necip Fazıl Ayan, Doctor of Philosophy, 2005 Dissertation directed by: Professor Bonnie J. Dorr Department of Computer Science Alignment of words, i.e., detection of corresponding units between two sentences that are translations of each other, has been shown to be crucial for the success ...

متن کامل

DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment

The frequent occurrence of divergences|structural diier-ences between languages|presents a great challenge for statistical word-level alignment. In this paper, we introduce DUSTer, a method for systematically identifying common divergence types and transforming an English sentence structure to bear a closer resemblance to that of another language. Our ultimate goal is to enable more accurate al...

متن کامل

Divergence Unraveling for Word Alignment of Parallel Corpora

We describe the use of parallel text for divergence unraveling in word-level alignment. DUSTer (Divergence Unraveling for Statistical Translation) is a system that combines linguistic and statistical knowledge to resolve structural differences between languages, i.e., translation divergences, during the process of alignment. Our immediate goal is to induce word-level alignments that are more ac...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Improved Word-Level Alignment: Injecting Knowledge about MT Divergences

نویسندگان

چکیده

منابع مشابه

Multi-align: Combining Linguistic and Statistical Techniques to Improve Alignments for Adaptable MT

Improving Bitext Word Alignments via Syntax-based Reordering of English

Title of dissertation : COMBINING LINGUISTIC AND MACHINE LEARNING TECHNIQUES FOR WORD ALIGNMENT IMPROVEMENT

DUSTer: A Method for Unraveling Cross-Language Divergences for Statistical Word-Level Alignment

Divergence Unraveling for Word Alignment of Parallel Corpora

عنوان ژورنال:

اشتراک گذاری